Unpublished Research Data

   Practically Extinct small

Data sets produced in the course of research but never shared or made available outside of the initial research team.

Digital Species: Research Outputs

Trend in 2023:

reduced riskMaterial Improvement

Consensus Decision

Added to List: 2019

Trend in 2024:

No change No Change

Previously: Practically Extinct

Imminence of Action

Action is recommended within twelve months. Detailed assessment is a priority.

Significance of Loss

The loss of tools, data or services within this group would impact on people and sectors around the world.

Effort to Preserve | Inevitability

Loss seems likely. By the time tools or techniques have been developed, the material will likely have been lost.

Examples

Unpublished research data can include different kinds of unpublished research data outputs, such as unstructured or structured datasets, databases, or other organized collections of computerized information or data such as periodical articles, books, graphics and multimedia.

‘Practically Extinct’ in the Presence of Aggravating Conditions

Originating researcher no longer active or changed research focus; staff on temporary contracts; dependence on single student or staff member; weak or fluid institutional commitment to subject matter; weak institutional commitment to data sharing; uncertainty over IPR or the presence of orphaned works; encryption; limited or dysfunctional data management planning.

‘Endangered’ in the Presence of Good Practice

Replication and documentation; data management plan; preservation pathway agreed.

2023 Review

This entry was added in 2019 as a subset of the ‘Unpublished Research Outputs’ reported in 2018, which was split into entries to draw attention to the different preservation requirements and concerns that arise. This entry relates specifically to research data which has not been shared or published by any means and is thus in contravention of the ‘FAIR’ principles which require data to be Findable, Accessible, Interoperable and Reusable. Without proper planning, research data can have a high barrier to re-use, especially where documentation is lacking. The 2019 Jury took the view that documentation and re-use go hand in hand, and researchers should be under no illusions that data not documented or shared faces material and immediate risks of extinction. The 2020 Jury agreed with the assessment. The 2021 Jury identified a trend towards reduced risk in light of more robust collaborative initiatives to jointly address the risk of data loss in and across research communities. The 2022 Taskforce identified a trend towards even more reduced risk based on material improvement over the last year (‘Material improvement’ trend), which had not only offered examples of good research data management and preservation practices but also suggested a significant shift toward a culture of change and collaboration across different research communities and stakeholders. Those mentioned included (but were not limited to) improvements and initiatives by the European Open Science Cloud (EOSC), Science Europe, Research Data Alliance (RDA), Digital Curation Centre (DCC) and related projects on the preservation of research data and outputs.

The 2023 Council, in light of the trends in 2021 and 2022, changed the classification from Practically Extinct to Critically Endangered, noting a positive trend of increased research data management activity and engagement by libraries, which should help to ensure that more research datasets are properly deposited in data repositories. They added that there was a general trend across many, if not most, HEI libraries producing research to do more in terms of research data management and a much larger part of what libraries do, with activities in this area growing and scaling up. However, the scale of unpublished datasets is hard to assess, as they are by definition unknown.

2024 Interim Review

These risks remain on the same basis as before, with no significant trend towards even greater or reduced risk (‘No change’ to trend).

Additional Comments

If we do not know it exists, does it exist? It may also be that in certain circumstances this includes data that is unfavourable and has intentionally not been published. If perceived as high-value, someone in the research team will likely take steps to ensure it is protected. We can be proactive and offer advice, but ultimately it is down to them. We cannot keep everything!

This is a wide field, so the scale and impact are hard to describe, but the risk is higher than papers due to potential file format complexity.

Success is dependent on how successful an institution’s research data management communications are. Advocacy and research are needed to show the scale of the problem, as well as education regarding open science and preservation.

Simply having a data management plan prepared is not sufficient, it needs to be properly implemented and kept up to date and relevant for both the researcher and the repository which will take a copy of the data. DMP should be used to appraise what data is worth long term preservation (e.g. NERC Data Value Check List), and what data is of lower quality, non-reusable, and even a reputational risk should it be shared further.


Scroll to top